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A.  INTRODUCTION  Distribution  Unlimited 

Presently  we  don’t  know  which  auditory  processing  principles  are  behind  the  level  of  performance 
humans  demonstrate  in  the  task  of  speech  reception  in  noise.  One  way  to  improve  our  capabilities  in 
designing  speech-processing  systems  that  will  be  effective  in  the  presence  of  background  noise  is  to 
advance  our  understanding  of  how  the  auditory  periphery  operates  in  such  environments,  and  to 
translate  this  understanding  into  a  computational  model.  Our  definition  of  the  auditory  periphery  is 
restricted  to  the  processing  which  takes  place  prior  to  lexical  access,  on  speech  segments  that  are  as 
long  as  100ms  (i.e.,  as  long  as  the  duration  of  a  dyad^).  Currently,  we  have  a  reasonable  understanding 
of  the  processing  principles  in  the  ascending  auditory  pathway  up  through  the  auditory  nerve  (AN)  [e.g., 
the  cochlea:  the  inner  hair  cells  (IHC)]  and  an  increasing  understanding  of  brain-stem  nuclei  (such  as  the 
cochlear  nucleus,  the  superior  olivary  complex,  the  inferior  colliculus).  We  have  limited  understanding  of 
the  descending  pathway  [mainly  the  MOC  to  outer-hair-cells  (OHC)  feedback]  and  very  little 
understanding  of  how  the  ascending  and  the  descending  pathways  interact. 

A  key  observation  is  that  human  performance  in  tasks  related  to  speech  intelligibility  deteriorates  only 
modestly  with  worsening  environmental  conditions  even  for  tasks  with  a  minimal  cognitive  load  (i.e.  no 
contextual  information  is  available,  and  the  only  source  of  information  available  to  the  listener  are  the  AN 
firing  patterns).  In  contrast,  simulated  AN  representations  -  generated  by  state-of-the-art  auditory  models 
-  deteriorate  at  a  much  faster  rate.  One  may  attribute  the  robust  performance  by  the  human  to  the 
existence  of  (1)  mechanisms  that  stabilize  the  AN  firing  patterns  in  the  presence  of  noise,  resulting  in  a 
representation  of  acoustic-phonemic  information  which  is  more  consistent  with  the  representation  in  a 
quiet  background,  and  (2)  efficient  peripheral  post-AN  mechanisms  that  are  capable  of  extracting 
important  acoustic-phonemic  cues  even  from  noisy  AN  patterns.  Our  underlying  assumption  is  that  the 
stabilizing  mechanism  and  the  post-AN  mechanisms  work  in  concert  in  providing  the  observed  graceful 
degradation  of  human  performance  in  noise.  We  suggest  that  the  success  of  post-AN  mechanisms  in 
reliably  extracting  speech-related  information  in  noise  is  partly  due  to  the  “stabilizing”  effect  of  the 
efferent  system.  Current  models  of  the  periphery  are  based  upon  the  ascending  pathway  up  through  the 
AN.  We  envision  a  model  of  the  periphery  that  utilizes  the  role  of  the  descending  pathway,  and  the  way 
the  ascending  and  the  descending  pathways  interact. 

One  auditory  mechanism  that  may  play  a  role  in  regulating  cochlear  mechanics  is  the  medial 
olivocochlear  (MOC)  efferent  feedback  system.  This  report  describes  our  efforts  at  quantifying  the 
possible  role  of  this  system  in  speech  reception  in  the  presence  of  background  noise. 

B.  BACKGROUND 

B.2  MOC  efferents:  morphology  and  physiology 

Numerous  papers  have  been  published  providing  detailed  morphological  and  neurophysiological 
description  of  the  MOC  efferent  feedback  system  (e.g.,  Gifford  and  Guinan,  1983;  Guinan,  1996;  Kawase 
and  Liberman,  1993;  Liberman,  1988;  Liberman  and  Brown  1986;  May  and  Sachs,  1992;  Warr,  1978; 
Winslow  and  Sachs,  1988).  MOC  efferents  originate  from  neurons  medial,  ventral  and  anterior  to  the 
medial  superior  olivary  nucleus  (MSO),  have  myelinated  axons,  and  terminate  directly  on  Outer  Hair 
Cells  (OHC).  Medial  efferents  project  predominantly  to  the  contralateral  cochlea,  the  innervation  is 
largest  near  the  center  of  the  cochlea,  with  the  crossed  innervation  biased  toward  the  base  compared  to 
the  uncrossed  innervation  (e.g.,  Guinan,  1996).  Roughly  two-third  of  medial  efferents  respond  to 
ipsilateral  sound,  one-third  to  contralateral  sound,  and  a  small  fraction  to  sound  in  either  ear.  Medial 
efferents  have  tuning  curves  that  are  similar  to,  or  slightly  wider  than,  those  of  AN  fibers  (e.g.,  Liberman 
and  Brown  1986),  and  they  project  to  different  places  along  the  cochlear  partition  in  a  tonotopical 
manner.  Finally,  medial  efferents  have  longer  latencies  and  group  delays  than  AN  fibers.  In  response  to 
tone  or  noise  bursts,  most  MOC  efferents  have  latencies  of  10-40ms.  Group  delays  measured  from 
modulation  transfer  functions  are  much  more  tightly  clustered,  averaged  at  about  8ms  (Gummer  etal.. 


'  An  acoustic  segment  from  the  midpoint  of  one  phoneme  to  the  midpoint  of  the  adjacent  phoneme 
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1988).  We  currently  do  not  have  a  clear  understanding  of  the  functional  role  of  this  mechanism.  Few 
suggestions  have  been  offered,  such  as  shifting  of  sound-level  functions  to  higher  sound  levels, 
antimasking  effect  on  responses  to  transient  sounds  in  a  continuous  masker,  preventing  damage  due  to 
intense  sound  (e.g.,  Guinan,  1996).  One  speculated  role,  which  is  of  particular  interest  for  this  proposal, 
is  a  dynamic  regulation  of  the  cochlear  operating  point  depending  on  background  acoustic  stimulation, 
resulting  in  robust  human  performance  in  perceiving  speech  in  a  noisy  background  (e.g.,  Kiang  et  al., 
1987).  There  are  a  few  neurophysiologcal  studies  to  support  this  role.  Using  anesthetized  cats  with  noisy 
acoustic  stimuli,  Winslow  and  Sachs  (1988),  for  example,  showed  that  by  stimulating  the  MOC  nerve 
bundle  electrically,  the  dynamic  range  of  discharge  rate  at  the  AN  is  partly  recovered.  Measuring  neural 
responses  of  awake  cats  to  noisy  acoustic  stimuli.  May  and  Sachs  (1992)  showed  that  the  dynamic 
range  of  discharge  rate  at  the  AN  level  is  only  moderately  affected  by  changes  in  levels  of  background 
noise. 

B. 3  MOC  efferents:  psychophysics  -  speech  and  speech-like  stimuli 

Few  behavioral  studies  indicate  the  potential  role  of  the  MOC  efferent  system  in  perceiving  speech  in  the 
presence  of  background  noise.  Dewson  (1968)  presented  evidence  that  MOC  lesions  impair  the  abilities 
of  monkeys  to  discriminate  the  vowel  sounds  [i]  and  [u]  in  the  presence  of  masking  noise  but  have  no 
effect  on  the  performance  of  this  task  in  quiet.  More  recently,  Giraud  et  al.  (1997),  and  Zeng  et  al.  (2000) 
showed  that  the  performance  of  humans  with  severed  MOC  feedback  deteriorates  phoneme  perception 
when  the  speech  is  presented  in  a  noisy  background.  Hienz  et  al.  (1998)  confirmed  the  classical  findings 
of  Dewson  (1968)  and  the  more  recent  observations  of  Giraud  et  al.  (1997);  they  showed  that  de- 
efferentation  of  the  cochlea  produces  vowel  discrimination  deficits  in  cats,  particularly  when  performance 
was  measured  in  the  presence  of  high  levels  of  background  noise. 

C.  PRESENT  STUDY 

We  collected  data  by  psychophysical  experiments  using  subjects  with  normal  hearing.  A  controlled 
activation  of  selected  parts  of  the  efferent  system  was  achieved  by  the  use  of  judiciously  designed 
stimulus  conditions. 

Phone  discrimination  in  noise  -  subjects  with  normal  hearing 

We  conducted  phoneme  discrimination  experiments  using  speech  with  a  minimal  context,  hence 
focusing  on  the  role  of  the  auditory  periphery  (by  reducing  the  role  of  higher  auditory  layers  to  a 
minimum).  Toward  this  end,  we  used  the  Diagnostic  Rhyme  Test  (DRT;  Voiers,  1983),  which  uses  the 
one-interval  two-alternative  forced-choice  paradigm  as  the  administrative  procedure.  The  properties  of 
the  DRT  are  reviewed  in  Sec.  C.  1  .a.  Our  assumptions  about  the  functioning  of  the  MOC  efferent  system 
in  the  presence  of  sustained  background  noise,  which  guided  the  design  considerations  of  the  proposed 
experiments,  are  summarized  in  Sec.  C.I.b.  The  definition  of  the  resulting  stimulus  conditions  and  the 
rational  behind  their  design  are  presented  in  Sec.  C.I.c.  The  results  and  data  analysis  are  summarized  in 
Sec.  0.1. d. 

C.1.a  Database  and  experimental  procedure  -  Voiers  DRT 

The  DRT  (Diagnostic  Rhyme  Test)  was  suggested  by  Voiers  (1983)  as  a  way  of  measuring  the 
intelligibility  of  processed  speech.  From  an  acoustic  point  of  view,  Voiers'  DRT  database  covers  initial 
dyads  of  spoken  CVCs.  The  database  consists  of  96  pairs  of  confusable  words  spoken  in  isolation. 

Words  in  a  pair  differ  only  in  their  initial  consonants.  The  dyads  are  equally  distributed  among  6  acoustic- 
phonetic  distinctive  features  and  among  8  vowels  (hence  2  word-pairs  per  a  [featurexvowel]  bin).  The 
feature  classification  (outlined  in  Table  1)  follows  the  binary  system  suggested  by  Jakobson,  Fant  and 
Halle  (Jakobson  et  al.,  1952),  and  the  vowels  are  [ee]  and  [it]  (High-Front),  [eh]  and  [at]  (High-Back),  [oo] 
and  [oh]  (Low-Front)  and  [aw]  and  [ah]  (Low-Back).  In  our  version  of  the  DRT  the  vowels  are  collapsed 
into  4  quadrants  (High-Front,  High-Back,  Low-Front,  Low-Back),  hence  4  word-pairs  per  a 
[featurexquadrant]  bin.  Driven  by  considerations  stemming  from  efferent  activation  properties  (see  Sec. 
C.I.c,  topic  1.1),  we  truncated  the  duration  of  the  consonantal  part  of  each  word  stimuli  to  a  maximum  of 
50ms  (measured  from  the  time-instant  of  consonantal-to-vocalic  transition,  backwards)  as  illustrated  in 
Fig.  1.  Consequently,  the  sound  quality  of  some  word  stimuli  was  degraded.  The  word-pairs  associated 
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with  these  words  were  removed  from  the  database,  resulting  in  a  database  of  72  word-pairs  evenly 
distributed  (i.e.  3  word-pairs  per  a  [featurexquadrant]  bin). 


The  psychophysical  procedure  is  very  carefully  controlled,  to  assure  a  task  with  minimum  cognitive  load. 
The  listeners  are  well  trained  and  are  very  familiar  with  the  database,  including  the  voice  quality  of  the 
individual  speakers.  The  experiment  is  a  one-interval  two-alternative  forced-choice  experiment.  First,  the 
subject  is  presented  visually  with  a  pair  of  rhymed  words.  Then,  one  word  of  the  pair  (selected  at 
random)  is  presented  aurally  and  the  subject  is  required  to  indicate  which  of  the  two  words  was  played. 
This  procedure  is  repeated  until  all  the  words  in  the  database  have  been  presented.  In  our  version  of  the 
DRT  words  were  played  sequentially,  one  every  2.5  seconds;  the  visual  presentation  preceded  the  aural 
presentation  by  1sec.,  and  the  decision  (binary)  was  made  within  1sec.  of  the  aural  presentation.  Words 
in  the  database  were  divided  into  sessions,  and  the  overall  duration  of  one  session  was  limited  to  about 
2.5  minutes. 

C.1.b  Stimulus  conditions  and  presumed  efferent  activity  -  background 

The  design  of  the  stimuli  for  the  proposed  experiments  was  guided  by  the  observed  behavior  of  the  MOC 
efferent  system  in  the  presence  of  sustained  background  noise,  as  illustrated  in  Fig.  2.  The  ipsilateral  ear 
is  defined  as  the  ear  that  will  be  presented  with  the  speech  signal.  For  monaural  presentation  of  noise  to 
the  ipsilateral  ear  (left  panel),  the  Information  pathway  relevant  to  MOC  efferent  activation  consists  of  the 
auditory  nerve  projection  to  the  posteroventral  subdivision  of  the  cochlear  nucleus  (PVCN);  the  MOC 
reflex  interneurons  at  the  PVCN,  likely  to  be  chopper  units  (Brown  et  al.,  2003),  which  project,  across 
midline,  to  MOC  neurons  at  the  medial  superior  olive  (MSO)  whose  axons  cross  back  (via  the  COCB 
nerve  bundle)  to  the  ipsilateral  cochlea,  innervating  the  outer  hair  cells  (OHCs).  Important 
neurophysiological  observations  are:  (1)  MOC  neurons  project  to  OHC  in  a  tonotopic  manner,  (2)  The 
effect  on  the  OHC  is  about  100ms  after  stimulus  onset,  and  (3)  the  activity  of  the  MOC  neurons  in 
background  noise  is  sustained  (i.e.  non-significant  rate  adaptation.  Brown  et  al.,  2003).  For  monaural 
presentation  of  noise  to  the  contralateral  ear  (right  panel),  the  information  pathway  consists  of  the  reflex 
interneurons  at  the  PVCN  at  the  contalateral  cochlear  nucleus,  which  project,  across  midline,  to  MOC 
neurons  at  the  medial  superior  olive  (MSO)  whose  axons  project  to  the  ipsilateral  cochlea.  An  important 
observation  is  that  the  strength  of  the  feedback,  in  terms  of  number  of  MOC  neurons  excited  by 
monaural  stimuli,  is  roughly  2:1  in  favor  of  the  ipsilateral  pathway  (e.g.  Liberman,  1988).  It  is  assume 
that  for  binaural  presentation  of  noise  the  feedback  strength  sums  to  1^. 

C.I.c  Stimulus  conditions  and  presumed  efferent  activity  -  definition 

The  stimulus  conditions  are  illustrated  in  Fig.  3  and  the  presumed  efferent  activity  they  invoke  are 
summarized  in  Table  2. 

1 .  Efferent  activation. 

1.1.  Monaural  presentation.  Fig.  3(a)  illustrates  the  baseline  condition,  [Sm-Gated],  where  the  noise 
turns  on  and  off  in  sync  with  the  word  stimulus.  The  noise  duration  is  1  sec.  long,  the  gap 
between  two  successive  words  is  2.5  seconds  (all  word  stimuli  in  the  database  are  less  than  1 
sec.  in  duration).  In  the  DRT,  words  in  a  given  word-pair  differ  in  the  initial  CV  dyad.  We  limited  the 
duration  of  the  consonantal  part  of  the  dyad  to  a  maximum  of  50ms  (see  Sec.  C.I.a).  Recalling 
that  the  effect  on  the  MOC  efferent  on  the  OHC  is  about  100ms  after  stimulus  onset,  the  50ms 
long  consonantal  part  and  the  immediate  50ms  long  vocalic,  coarticulated  part  of  the  initial  CV 
dyad  are,  therefore,  presumed  to  be  processed  by  a  cochlea  with  no  efferent  elicitation.  This  is 
marked  by  “0”  ipsilateral  activation  in  Table  2,  row  1.  In  contrast,  in  condition  [Sm-Cont.]  (Fig.  3(b)) 
noise  remains  on  throughout  the  session,  hence  eliciting  ipsilateral  activation  (with  strength  of  %, 
as  indicated  in  Table  2,  row  2). 


^  A  noteworthy  observation  is  the  existence  MOC  neurons  with  binaural  inputs  (Liberman,  1988);  the  behavior  of 
these  units  is  less  understood. 
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1.2.  Noise  characteristics,  noise  intensity  and  SNR.  We  used  Speech-Shape  noise,  additive. 
Calculation  of  SNR  was  over  the  first  1 00ms  of  the  vowel  (starting  at  the  consonantal-to-vocalic 
transition  point  of  the  initial  dyad).  Noise  intensity,  in  dB  SPL,  was  the  parameter  and  was  fixed 
throughout  a  session.  The  other  parameter  in  the  experiment  was  the  SNR;  the  intensity  of  the 
word  stimuli  was  adjusted  -  amplified,  or  attenuated,  by  a  constant  -  to  satisfy  the  nominal  SNR 
value.  The  range  of  noise  intensities  and  SNRs  will  be  detailed  in  Sec.  C.I.d. 

1.3.  Binaural  presentation.  The  signal  presented  to  the  ipsilateral  ear  is  the  same  as  in  the  monaural 
presentation.  The  contralateral  ear  receives  noise  alone,  with  the  same  intensity  as  the  noise  in 
the  ipsilateral  ear.  This  is  illustrated  in  Figs.  3(c)  and  3(d)  (for  [Sm-Gated]  and  [Sm-Cont.]  in  the 
ipsilateral  ear,  respectively).  Presumed  efferent  elicitation  is  indicated  in  rows  3  and  4  of  Table  2. 

2.  Effect  of  binaural  processing.  A  question  arises  whether  the  expected  contribution  to  performance 
should  be  attributed  to  the  efferent  system  perse,  or  to  an  advantage  resulting  from  processing  by 
binaural  mechanisms  up  stream.  To  address  this  question,  in  all  conditions  that  involve  a  presentation 
to  both  ears  (Table  2,  rows  3  through  7)  we  used  two  kinds  of  contralateral  noise:  [Nu]  -  noise  in  both 
ears  is  uncorrelated  (no  binaural  advantage):  and  [No]  -  same  noise  realization  in  both  ears  (possible 
binaural  advantage). 

3.  Effect  of  integration  in  ascending  pathway.  As  we  found  out  in  Experiment  I  [Sec.  C.I.d,  Table  3(a)], 
performance  for  the  binaural  conditions  deteriorated  compared  to  the  corresponding  monaural 
conditions.  This  result  seems  unexpected,  since  an  additional  contralateral  efferent  elicitation  is 
supposed  to  improve  performance.  One  possible  hypothesis  to  explain  this  oddity  is  a  possible 
degradation  in  representing  speech  cues  in  the  ascending  pathway  as  a  result  of  stimulating  the 
contralateral  ear  with  noise  only.  Adding  noise  to  the  contra  ear,  in  addition  to  the  noisy  speech 
presented  to  the  ipsilateral  ear,  may  create  a  noisier  integrated  image  of  the  speech  input.  To  resolve 
this  issue  we  introduced  condition  [Sm-ContHNu-Gated]  (Fig.  3(f)),  where  the  noise  in  the  contra  ear 
is  turned  off  during  the  time-interval  where  speech  is  presented  in  the  ipsi  ear,  resulting  in  speech 
image  provided  by  the  ipsilateral  path  alone.  If  our  hypothesis  about  the  role  of  the  efferent  system  is 
correct,  the  performance  of  this  condition  (i.e.  condition  [Sm-Cont.HNu-Gated])  should  be  at  least  as 
good  as  condition  [Sm-Cont]  (i.e.  monaural  ipsi  activation). 

4.  fSm-Gatedl  vs.  fSm-Cont.1  -  fused  vs.  segregated  auditory  images?  As  we  found  out  in  Experiment  I 
(Sec.  C.1  .d),  performance  for  condition  [Sm-Cont.]  (ipsilateral  elicitation)  is  significantly  better  than  that 
measured  for  the  baseline  condition  [Sm-Gated].  A  question  arises  whether  this  advantage  is  due  to 
the  efferent  system  perse,  or  the  effect  of  involvement  of  auditory  segregation  mechanisms;  a  word 
stimulus  added  to  a  continuously  running  noise  (condition  [Sm-Cont.])  may  be  segregated  easier  than 
having  gated-noise  and  a  word  stimulus  switch  on  at  the  same  time  instant.  To  resolve  this  issue  we 
introduced  the  monaural  condition  [Sm-Gated-WGN]  (Fig.  3(e)),  where  the  silence  gaps  of  condition 
[Sm-Gated]  are  filled  with  white  Gaussian  noise,  with  the  same  intensity  as  the  baseline  noise.  We 
assume  that,  due  to  the  markedly  different  characteristics  of  the  two  noise  realizations,  the  image  of 
the  noisy  speech  is  as  fused  as  in  condition  [Sm-Gated].  However,  the  efferent  elicitation  is  as  in 
condition  [Sm-Cont.].  (This  is  indicated  in  row  8  of  Table  2.)  If  our  hypothesis  about  the  role  of  the 
efferent  system  is  correct,  the  performance  of  this  condition  (i.e.  condition  [Sm-Gated-WGN])  should 
be  as  good  as  condition  [Sm-Cont]. 

C.I.d  Results  and  data  analysis 

The  study  comprises  two  experiments,  with  Experiment  11  emerging  out  of  Experiment  I. 

In  Experiment  I  we  measured  the  performance  for  four  conditions,  no  efferent  activation  (the  baseline 
condition  [Sm-Gated],  illustrated  in  Fig.  3(a)),  monaural,  ipsilateral  efferent  activation  (condition  [Sm- 
Cont.],  Fig.  3(b)),  and  two  conditions  of  binaural  efferent  activation  [conditions  [Sm-Cont.]-[Nu]  and  [Sm- 
Cont.]-[No],  Fig.  3(d)],  one  with  uncorrelated  noise  in  the  contralateral  ear,  one  with  the  same  noise  in 
both  ears  (i.e.,  providing  binaural  advantage).  Four  subjects  were  tested  (on  their  left  ear).  Their 
performance  is  presented  in  Table  3(a)  per  subject,  since  performance  is  expected  to  vary  across 
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subjects  due  to  differences  in  the  strength  of  their  efferent  reflex  (e.g.  Guinan  et  al.,  2003).  A  table-entry 
shows  the  mean  word-error®  ±  its  standard  deviation.  The  table-entry  also  contains,  in  parentheses,  the  t- 
statistic  for  the  difference  between  means,  relative  to  the  baseline  condition.  For  the  number  of  degrees- 
of-freedom  in  the  data  here,  a  difference  between  the  means  is  significant  at  the  p<0.05  level,  if  the 
absolute  value  of  the  t-statistic  is  (roughly)  greater  than  2.5. 

The  results  show  a  significant  reduction  in  the  mean  number  of  errors  for  the  ipsilateral  efferent  activation 
compared  to  no  efferent  activation,  for  all  subjects  (at  least  25%).  Using  the  t-test  criteria,  the  difference 
between  the  means  (ipsilateral  condition  vs.  baseline)  is  significant.  To  our  surprise,  adding  continuous 
contralateral  noise  resulted  in  degradation  in  performance  compared  to  the  ipsilateral  (monaural)  efferent 
activation  condition,  consistent  across  subjects  (rows  3  and  4).  This  was  the  case  for  either  uncorrelated 
noise  to  the  contra  ear,  or  same  noise  to  both  ears  (the  later  should  provide  a  binaural  advantage).  From 
efferent  elicitation  viewpoint,  these  results  seem  unexpected  since  adding  noise  to  the  contra  ear  should 
strengthen  the  efferent  response  to  the  ipsilateral  cochlea  and  should,  therefore,  improve  performance'*. 
(The  t-test  here  indicates  insignificant  difference  between  the  means,  compared  to  baseline).  Two  more 
observations  are  noteworthy.  First,  as  was  expected  performance  varies  across  subjects,  but  the  trend 
as  a  function  of  efferent  elicitation  condition  is  similar.  Second,  question  arises  whether  the  advantage  in 
the  ipsilateral  condition  is  due  to  the  efferent  system  per  se;  it  may  be  that  words  presented  to  the 
ipsilateral  ear  in  the  presence  of  continuous  noise  are  segregated  easier  than  having  gated  noise  and  a 
word  stimulus  switch  on  at  the  same  time  instant  (as  in  the  baseline  condition). 

In  Experiment  II  we  addressed  two  questions  that  emerged  out  of  Experiment  I,  (1)  what  is  the  reason  for 
the  degradation  in  performance  for  binaural  efferent  activation,  and  (2)  is  the  advantage  for  ipsilateral 
efferent  activation  due  to  differences  in  the  nature  of  the  projected  auditory  images  (i.e.  “fused”  vs. 
segregated)?  For  the  first  question  we  hypothesize  that  the  reason  for  the  drop  in  performance  is  a 
further  corruption  of  speech  cues  in  the  ascending  pathway,  resulted  from  presenting  extra  noise  (with  no 
extra  signal)  to  the  contra  ear.  To  test  this  hypothesis  we  turned  the  noise  in  the  contra  ear  off  during  the 
time-interval  when  speech  was  presented  to  the  ipsi  ear  [condition  [Sm-Cont.]-{Nu-Gated],  Fig.  3(f)]. 
Results  suggest  restoration  of  the  performance  level,  closer  to  the  level  of  performance  measured  for  the 
ipsi  efferent  activation®.  The  second  question  was  addressed  by  creating  a  monaural  signal  where  the 
added  noise  during  the  duration  of  the  word  stimuli  is  speech-shaped,  while  the  noise  in  between  the 
word  stimuli  is  white  [condition  [Sm-Gated-WGN],  Fig.  3(e)].  In  this  way  the  ipsi  efferent  elicitation  is 
maintained  while  the  image  of  the  noisy  word  stimulus  is  fused  (due  to  the  different  nature  of  the  noise 
segments).  Results  suggest  restoration  of  the  performance  level,  closer  to  the  level  of  performance 
measured  for  the  ipsi  efferent  activation. 

Finally,  errors  collected  in  DRT  sessions  can  be  averaged  over  subjects  and  plotted  as  a  function  of  the 
Jakobsonian  acoustic-phonetic  dimensions,  as  illustrated  in  Figures  4(a)  and  4(b).  The  panels  at  each 
figure  represent  different  efferent  elicitation  conditions.  Fig.  4(a)  is  for  word-pairs  in  the  High-Front  vowel 
quadrant,  and  Fig.  4(b)  is  for  word-pairs  in  the  Low-Front  vowel  quadrant.  Using  knowledge  about  the 
acoustic  correlates  of  the  Jakobsonian  dimensions,  the  information  in  Fig.  4  can  be  used  to  identify  the 
origins  of  the  errors  in  the  time-frequency  plane.  Notice  the  difference  in  error  distributions  between  Fig. 
4(a)  and  Fig.  4(b).  and  the  difference  between  panels  within  each  figure.  These  differences  may  result 
from  different  efferent  effects  depending  on  the  location  of  the  formants  (different  in  each  vowel 
quadrant). 


^  The  range  of  the  mean  word-error  is  [0,3]. 

'*  The  degradation  in  performance  when  adding  noise  to  the  contralateral  ear  is  also  unexpected  from  the  binaural 
processing  viewpoint.  Results  from  detection  studies  indicate  little  or  no  additional  masking  from  a  contralateral 
noise.  When  the  contralateral  noise  is  in-phase,  in  fact,  there  is  an  improvement  in  detection  (Durlach  and  Colburn, 
1978).  The  absence  of  improvement  here  may  be  due  to  the  fact  that  the  cues  for  consonant  discrimination  tend  to 
be  above  1500  Hz,  where  binaural  detection  improvements  are  negligible. 

’  Only  two  subjects  were  tested  in  Experiment  II. 
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Table  1.  Samples  of  word-pairs  used  in  Voiers’  DRT  (1983) 


3  i  Voicing  (Vp 

Nasality  (NS) 

Sustention  (ST) 

( Voiced  Unvoiced 

{Nasal  -  Ora!) 

{Sustained  -Interrupted) 

veal  -  feel 

meat  -  beat 

vee  -  bee 

zed  -  said 

neck  -  deck 

fence  -  pence 

,,  Sibilation  (SB) 

Graveness  (GV) 

Cbrripactness  (CM) 

{SibHaied  -Assibilated) 

{Crave  -  Acute) 

{Compact  -  Diffuse) 

cheep  -  keep 

peak  -  teak 

key  -  tea 

jot  -  got 

wad  -  rod 

got  -  dot 

- 

- 

- 

Table  2.  Presumed  efferent  activation.  See  Fig.  3  for  description  of  signal  condition. 


'  Fig  .  4 

Stimulus  condition 

ipsi  activation 

icdntra  activation 
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.(a) 

;  [Sm-Gated] 
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0 
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i;(b)  , 

[Sm-Cont.]  ;  ; 

% 

0 

ISm-Catedl-rNu]  ,  “ 
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[Sm-Cont.]-[Nij]  ’  • 
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1 ’•::?(()  1. 

[Sm-Cbrif.]-[Nu-Gated] 

1  % 

‘/3 
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[Sm-Cont.]-[No-Gated] 
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*/3 
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Table  3(a).  Experiment  I.  Mean  word  errors  (out  of  3  words)  ±  Standard  deviation.  In  parentheses  are  the  t-value  of 
difference  between  the  means  relative  to  baseline  condition  (row  1).  Columns  -  subjects. 


.  Presumed  efferent 

activation  ' 

EB(d.f.=i6) 

JZ  (d.f.f  i  e)  • 

bc(d.f.=ip) 

,  ,  No  activation 

0.36±0.08 

0.38+0.09 

0.41  ±0.05 

0.53±0.11 

'  Ipsilateral  activation 

0.28±0.04 

(3.32) 

0.29±0.06 

(3.74) 

0.31+0.04 

(4.91) 

Binaural  activation, 
f  uncorrelated  noise 

0.42+0.04 

(-2.02) 

0.37±0.06 

(0.52) 

0.36±0.04 

(2.06) 

0.37+0.06 

(3.35) 

Binaural,  same  noise 
. in'  bdth  ears 

0.34+0.06 

(0.49) 

0.32±0.05 

(1.89) 

0.39±0.08 

(0.61) 

0.35+0.08 

(3.44) 

Table  3(b).  Experiment  II. 


.  Presumed  efferent 
’’activation  v  i  ■. 

‘jZ(d,f.=  16) 

OC  (d.lU>;6),;> 

Binaurai,  gated  noise  in 
■  •  ■  i'  Cdntra  ear  . 

- 

0.33±0.03 

(1.61) 

- 

0.31  ±0.04 

(4.75) 

Ipsilateral  activation, 
"fused"  image 

- 

0.32+0.03 

(1.16) 

- 

0.29±0.08 

(4.63) 
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50ms 

Figure  1 .  The  consonantal  part  of  each  word  stimuli  was  truncated  to  a  maximum  of  50ms  (measured  from  the  time- 
instant  of  consonantal  to  vocalic  transition,  backwards. 


iDsilateral  activation 


Contralateral  activation 


Binaural  activation 


Figure  2.  Illustration  of  the  MOC  neurons  wiring  -  a  block  diagram 
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Figure  3.  Illustration  of  signal  conditions.  Sm  stands  for  Signal-monaural;  N  stands  for  noise  at  ipsi  ear;  Nu  stands 
for  noise  at  contralateral  ear,  uncorrelated  to  noise  at  ipsilateral  ear;  No  -  correlated  noise  at  cotralateral  ear  (N,  Nu 
and  No  are  speech-shaped  noise  signals);  WON  stands  for  white  Gaussian  noise. 
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+  4.-.  4.  -  +-  +-  4-  +~  +-  + - 

VC  NS  ST  SB  GV  CM  VC  NS  ST  SB  GV  CM 


Figure  4(a).  Error  patterns  for  High-Front  vowels  in  the  DRT  database.  Noise  is  speech-shaped,  with  intensity  of  60dB  SPL. 
SNR  is  5dB.  Abscissa  -Jakobsonian  acoustic  features  (+  for  attribute  present,  -  for  attribute  absent).  Ordinate  -  mean 
errors,  in  words  (out  of  3  words  per  bin). 


Figure  4(b).  Error  patterns  for  Low-Front  vowels  in  the  DRT  database. 
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